long input
LongReasonArena: A Long Reasoning Benchmark for Large Language Models
Ding, Jiayu, Ma, Shuming, Cui, Lei, Zheng, Nanning, Wei, Furu
Existing long-context benchmarks for Large Language Models (LLMs) focus on evaluating comprehension of long inputs, while overlooking the evaluation of long reasoning abilities. To address this gap, we introduce LongReasonArena, a benchmark specifically designed to assess the long reasoning capabilities of LLMs. Our tasks require models to solve problems by executing multi-step algorithms that reflect key aspects of long reasoning, such as retrieval and backtracking. By controlling the inputs, the required reasoning length can be arbitrarily scaled, reaching up to 1 million tokens of reasoning for the most challenging tasks. Extensive evaluation results demonstrate that LongReasonArena presents a significant challenge for both open-source and proprietary LLMs. For instance, Deepseek-R1 achieves only 7.5% accuracy on our task. Further analysis also reveals that the accuracy exhibits a linear decline with respect to the logarithm of the expected number of reasoning steps. Our code and data is available at https://github.com/LongReasonArena/LongReasonArena.
LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning
Mao, Yansheng, Xu, Yufei, Li, Jiaqi, Meng, Fanxu, Yang, Haotong, Zheng, Zilong, Wang, Xiyuan, Zhang, Muhan
Long context understanding remains challenging for large language models due to their limited context windows. This paper presents Long Input Fine-Tuning (LIFT), a novel framework for long-context modeling that can improve the long-context performance of arbitrary (short-context) LLMs by dynamically adapting model parameters based on the long input. Importantly, LIFT, rather than endlessly extending the context window size to accommodate increasingly longer inputs in context, chooses to store and absorb the long input in parameter. By fine-tuning the long input into model parameters, LIFT allows short-context LLMs to answer questions even when the required information is not provided in the context during inference. Furthermore, to enhance LIFT performance while maintaining the original in-context learning (ICL) capabilities, we introduce Gated Memory, a specialized attention adapter that automatically balances long input memorization and ICL. We provide a comprehensive analysis of the strengths and limitations of LIFT on long context understanding, offering valuable directions for future research.
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning
Mao, Yansheng, Li, Jiaqi, Meng, Fanxu, Xiong, Jing, Zheng, Zilong, Zhang, Muhan
Long context understanding remains challenging for large language models due to their limited context windows. This paper introduces Long Input Fine-Tuning (LIFT) for long context modeling, a novel framework that enhances LLM performance on long-context tasks by adapting model parameters to the context at test time. LIFT enables efficient processing of lengthy inputs without the computational burden of offline long-context adaptation, and can improve the long-context capabilities of arbitrary short-context models. The framework is further enhanced by integrating in-context learning and pre-LIFT supervised fine-tuning. The combination of in-context learning and LIFT enables short-context models like Llama 3 to handle arbitrarily long contexts and consistently improves their performance on popular long-context benchmarks like LooGLE and LongBench. We also provide a comprehensive analysis of the strengths and limitations of LIFT on long context understanding, offering valuable directions for future research. Large Language Models (LLMs), such as GPT-4 (Achiam et al., 2023), have revolutionized the field of natural language processing, driving breakthroughs in text generation and significant advancements in tasks like translation, summarization, and conversation. Lengthy sequences, which can span up to millions of tokens, are common in real-world applications including long books (Kočiskỳ et al., 2018), high-resolution videos (Wu et al., 2024; Tapaswi et al., 2016), and audio signals (Yang et al., 2024). Extending the context window allows models to capture dependencies across larger text spans and improve coherence, understanding, and accuracy in tasks that require reasoning over extended inputs.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
SEGMENT+: Long Text Processing with Short-Context Language Models
Shi, Wei, Li, Shuang, Yu, Kerun, Chen, Jinglei, Liang, Zujie, Wu, Xinhui, Qian, Yuxi, Wei, Feng, Zheng, Bo, Liang, Jiaqing, Chen, Jiangjie, Xiao, Yanghua
There is a growing interest in expanding the input capacity of language models (LMs) across various domains. However, simply increasing the context window does not guarantee robust performance across diverse long-input processing tasks, such as understanding extensive documents and extracting detailed information from lengthy and noisy data. In response, we introduce SEGMENT+, a general framework that enables LMs to handle extended inputs within limited context windows efficiently. SEGMENT+ utilizes structured notes and a filtering module to manage information flow, resulting in a system that is both controllable and interpretable. Our extensive experiments across various model sizes, focusing on long-document question-answering and Needle-in-a-Haystack tasks, demonstrate the effectiveness of SEGMENT+ in improving performance.
- Asia > Singapore (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
Jin, Bowen, Yoon, Jinsung, Han, Jiawei, Arik, Sercan O.
Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources. The increasing capacity of LLMs to process longer input sequences opens up avenues for providing more retrieved information, to potentially enhance the quality of generated outputs. It is plausible to assume that a larger retrieval set would contain more relevant information (higher recall), that might result in improved performance. However, our empirical findings demonstrate that for many long-context LLMs, the quality of generated output initially improves first, but then subsequently declines as the number of retrieved passages increases. This paper investigates this phenomenon, identifying the detrimental impact of retrieved "hard negatives" as a key contributor. To mitigate this and enhance the robustness of long-context LLM-based RAG, we propose both training-free and training-based approaches. We first showcase the effectiveness of retrieval reordering as a simple yet powerful training-free optimization. Furthermore, we explore training-based methods, specifically RAG-specific implicit LLM fine-tuning and RAG-oriented fine-tuning with intermediate reasoning, demonstrating their capacity for substantial performance gains. Finally, we conduct a systematic analysis of design choices for these training-based methods, including data distribution, retriever selection, and training context length.
- North America > Puerto Rico (0.14)
- Africa > West Africa (0.04)
- North America > United States > Illinois (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Leisure & Entertainment (1.00)
- Government (1.00)
- Media > Film (0.67)
- Education (0.66)
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Song, Woomin, Oh, Seunghyuk, Mo, Sangwoo, Kim, Jaehyung, Yun, Sukmin, Ha, Jung-Woo, Shin, Jinwoo
Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address the computational demands of self-attention. In this paper, we present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations. HOMER uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. Each chunk is then processed collectively, employing a hierarchical strategy that merges adjacent chunks at progressive transformer layers. A token reduction technique precedes each merging, ensuring memory usage efficiency. We also propose an optimized computational order reducing the memory requirement to logarithmically scale with respect to input length, making it especially favorable for environments with tight memory restrictions. Our experiments demonstrate the proposed method's superior performance and memory efficiency, enabling the broader use of LLMs in contexts requiring extended context. Code is available at https://github.com/alinlab/HOMER.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Michigan (0.04)
CoLT5: Faster Long-Range Transformers with Conditional Computation
Ainslie, Joshua, Lei, Tao, de Jong, Michiel, Ontañón, Santiago, Brahma, Siddhartha, Zemlyanskiy, Yury, Uthus, David, Guo, Mandy, Lee-Thorp, James, Tay, Yi, Sung, Yun-Hsuan, Sanghai, Sumit
Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (7 more...)
Google's CoLT5 Processes Extremely Long Inputs via Conditional Computation
One of the highlights of OpenAI's GPT-4 large language model (LLM) is its expanded context window size of 32,000 tokens (about 25,000 words), which enables longer input sequences and conversations than ChatGPT's 4,000 token limit. While expanding the processing capacities of transformer-based LLMs in this way is beneficial, it is also computationally costly due to the quadratic complexity of the models' attention mechanisms and the application of feedforward and projection layers to every token. A Google Research team addresses this issue in the new paper CoLT5: Faster Long-Range Transformers with Conditional Computation, proposing CoLT5 (Conditional LongT5), a family of transformer models that apply a novel conditional computation approach for higher quality and faster long-input processing of up to 64,000 tokens. CoLT5 is built on Google's LongT5 (Gua et al., 2022), which simultaneously scales input length and model size to improve long-input processing in transformers; and is inspired by the idea that better performance and reduced computation cost can be achieved via a novel "conditional computation" approach that allocates more computation to important tokens. The conditional computation mechanism comprises three main components: 1) Routing modules, which select important tokens at each attention or feedforward layer; 2) A conditional feedforward layer that applies an additional high-capacity feedforward layer to select important routed tokens; and 3) A conditional attention layer that enables CoLT5 to differentiate between tokens that require additional information and those that already possess such information.
ETC: Encoding Long and Structured Data in Transformers
Ainslie, Joshua, Ontanon, Santiago, Alberti, Chris, Pham, Philip, Ravula, Anirudh, Sanghai, Sumit
Transformer-based models have pushed the state of the art in many natural language processing tasks. However, one of their main limitations is the quadratic computational and memory cost of the standard attention mechanism. In this paper, we present a new family of Transformer models, which we call the Extended Transformer Construction (ETC), that allows for significant increases in input sequence length by introducing a new global-local attention mechanism between a global memory and the standard input tokens. We also show that combining global-local attention with relative position encodings allows ETC to handle structured data with ease. Empirical results on the Natural Questions data set show the promise of the approach.